Robust Speech Translation by Domain Adaptation
نویسندگان
چکیده
Speech translation tasks usually are different from text-based machine translation tasks, and the training data for speech translation tasks are usually very limited. Therefore, domain adaptation is crucial to achieve robust performance across different conditions in speech translation. In this paper, we study the problem of adapting a general-domain, writing-textstyle machine translation system to a travel-domain, speech translation task. We study a variety of domain adaptation techniques, including data selection and incorporation of multiple translation models, in a unified decoding process. The experimental results demonstrate significant BLEU score improvement on the targeting scenario after domain adaptation. The results also demonstrate robust translation performance achieved across multiple conditions via joint data selection and model combination. We finally analyze and compare the robust techniques developed for speech recognition and speech translation, and point out further directions for robust translation via variability-adaptive and discriminatively-adaptive learning.
منابع مشابه
Domain Adaptation of a Distributed Speech-To-Speech Translation System
This paper is on the experiment to design, implement, and optimize a speech-to-speech translation system that is solely based on an appropriate combination of currently available commercial components for speech recognition, machine translation, and speech synthesis. Principal feasibility and performance improvement by domain adaptation have been investigated. We have chosen a distributed archi...
متن کاملA System for Simultaneous Translation of Lectures and Speeches
This thesis realizes the first existing automatic system for simultaneous speech-to-speech translation. The focus of this system is the automatic translation of (technical oriented) lectures and speeches from English to Spanish, but the different aspects described in this thesis will also be helpful for developing simultaneous translation systems for other domains or languages. With growing glo...
متن کاملLinguistic representation of Finnish in a limited domain speech-to-speech translation system
This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Source medical domain speech-to-speech translation system. The paper describes the collection of the medical sub-domain corpora for Finnish, the creation of the Finnish generation grammar by adapting the original English grammar, the composition of the domain specific Finnish lexicon and the definiti...
متن کاملRobust machine translation for multi-domain tasks
In this thesis, we investigate and extend the phrase-based approach to statistical machine translation. Due to improved concepts and algorithms, the quality of the generated translation hypotheses has been significantly improved in recent years. Still, the translation quality leaves a lot to be desired when going beyond traditional translation tasks, such as newswire articles, and when addressi...
متن کاملFLORS: Fast and Simple Domain Adaptation for Part-of-Speech Tagging
We present FLORS, a new part-of-speech tagger for domain adaptation. FLORS uses robust representations that work especially well for unknown words and for known words with unseen tags. FLORS is simpler and faster than previous domain adaptation methods, yet it has significantly better accuracy than several baselines.
متن کامل